Model Selection

Multimodal Task Processing

# Multimodal Task Processing

Openvla 7b Oft Finetuned Libero Spatial

OpenVLA-OFT is an optimized vision-language-action model that significantly improves the running speed and task success rate of the basic OpenVLA model through fine-tuning technology.

Multimodal Fusion

Vitucano 2b8 V1

ViTucano is the first natively Portuguese pre-trained visual assistant, combining visual understanding and language capabilities, suitable for multimodal tasks such as image captioning and visual question answering.

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase